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INFORMATION-THEORETIC TOOLS FOR PARAMETRIZED 
COARSE-GRAINING OF NON-EQUILIBRIUM EXTENDED 

SYSTEMS* 
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CD ' Abstract. In this paper we focus on the development of new methods suitable for efficient 

^S) ' and reliable coarse-graining of non-equilibrium molecular systems. In this context, we propose error 

estimation and controUed-fidelity model reduction methods based on Path-Space Information The- 
ory, and combine it with statistical parametric estimation of rates for non-equilibrium stationary 
^^ 1 processes. The approach we propose extends the applicability of existing information-based methods 

'^1>( ' for deriving parametrized coarse-grained models to Non-Equilibrium systems with Stationary States 

(NESS). In the context of coarse-graining it allows for constructing optimal parametrized Markovian 
coarse-grained dynamics, by minimizing information loss (due to coarse-graining) on the path space. 
Furthermore, the associated path-space Fisher Information Matrix can provide confidence intervals 
for the corresponding parameter estimators. We demonstrate the proposed coarse-graining method in 
fH ' a non-equilibrium system with diffusing interacting particles, driven by out-of-equilibrium boundary 

f^ ' conditions. 

^ |. Key words, coarse-grained dynamics, non-cqulibrium stationary states, driven difussion, rela- 

tive entropy rate, Fisher information matrix, parametrization, kinetic Monte Carlo, Markov processes 

o' 

O ' 1. Introduction. Non-equilibrium systems at transient or steady state regimes 

c/3 , are typical in applied science and engineering, and are the result of coupling between 

^ ' different physicochemical mechanisms, driven by external couplings or boundary con- 

C^ , ditions. Typical examples include reaction-diffusion systems in heteroepitaxial cat- 

alytic materials, polymeric flows and separation processes in microporous materials, 
[5SJ[551[55]. In this paper we develop reliable model-reduction methods capable to han- 
dle extended, non- equilibrium statistical mechanics models and related multi-physics 
systems. Model-reduction (or coarse-graining) approaches can be often described 
^►^ ' in the context of parameter estimation of parametrized statistical models. How- 

f-^ , ever, atomistic models of materials lead to high-dimensional probability distributions 

f^ ' and/or stochastic processes to which the standard methods of statistical inference 

t*^ , and model discrimination are not directly applicable. The emphasis on information 

["*"■ ' theory tools is also partly justified since often we are interested in probability den- 

'sj" I sity functions (PDF), typically non-Gaussian, due to the significance of tail events in 

^D • complex systems. A primary focus of this paper is on systems with Non-Equilibrium 

Steady States (NESS), i.e., systems in which a steady state is reached but the detailed 
balance condition is violated and explicit formulas for the stationary distribution, e.g., 
in the form of a Boltzmann distribution, are not available. 
k>( ' Application of information-theoretic methods to analysis of stochastic models uses 

5_] , entropy-based techniques for analyzing and estimating a distance between (probabil- 

_Cy_' ity) measures. The relative entropy (Kullback -Leibler divergence) of two probability 

measures fi{dx) = ^{x) dx and v{dx) — v{x) dx 
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allows US to define a pseudo-distance between two measures. A key property of the 
relative entropy TZ{P\Q) is that TZ{P\Q) > with equality if and only ii P = Q, 
which allows us to view relative entropy as a "distance" (more precisely a semi-metric) 
between two probability measures P and Q. Moreover, from an information theory 
perspective [7], the relative entropy measures loss/change of information. Relative 
entropy for high-dimensional systems was used as measure of loss of information in 
coarse-graining [HI [HI [2] , and sensitivity analysis for climate modeling problems [22] . 

Using entropy-based analytical tools has proved essential for deriving rigorous 
results for passage from interacting particle models to mean- field description, |20j . 
The application of relative entropy methods to the error analysis of coarse-graining of 
stochastic particle systems have been introduced and studied in [Ml [TBI [13 [13 [H] . 
Aside of this rigorous numerical analysis direction, entropy-based computational tech- 
niques were also developed which are used for constructing approximations of coarse- 
grained (effective) potentials for models of large biomolecules and polymeric systems 
(fluids, melts). Optimal parametrization of effective potentials based on minimizing 
the relative entropy between equilibrium Gibbs states, e.g.,[6l [5] [3], extended pre- 
viously developed inverse Monte Carlo methods, primarily based on force matching 
approaches, used in coarse-graining of macromolecules (see, e.g., [3T1 [U)- I^i [13 
an extension to dynamics is proposed in the context of Fokker-Planck equations, by 
considering the corresponding relative entropy for discrete-time approximations of the 
transition probabilities. Furthermore, relative entropy was used as means to improve 
model fidelity in a parametric, multi-model approximation framework of complex dy- 
namical systems, at least when the model's steady-state distributions are explicitly 
known, e.g. are Gaussian, [23] . 

Overall, such parametrization techniques are focusing on systems with a known 
steady state, such as a Gibbs equilibrium distribution. More specifically, computa- 
tional implementations of optimal parametrization in the inverse Monte Carlo meth- 
ods is relatively straightforward for equilibrium systems in which the best-fit procedure 
is applied to an explicitly known equilibrium distribution and where relative entropy 
is explicitly computable. On the other hand, this is not the case in non-equilibrium 
systems, even at a steady state where typically we do not have a Gibbs structure and 
the steady-state distribution is unknown altogether, setting up one of the primary 
challenges for this paper. Indeed, here we show that, in non-equilibrium systems the 
general information theory ideas based on KuUback-Leibler divergence are still ap- 
plicable but they have to be properly formulated in the context of Non-equilibrium 
Statistical Mechanics by focusing on the probability distribution of the entire time 
series, i.e. on the path space of the underlying stochastic processes. We show that, sur- 
prisingly, such a path-space relative entropy formulation is: (a) general in the sense 
that it applies to any Markovian models (e.g. Langevin dynamics. Kinetic Monte 
Carlo, etc), and (b) is easily computable as an ergodic average in terms of the Rela- 
tive Entropy Rate, therefore allowing us to construct optimal parametrized Markovian 
coarse-grained dynamics for large classes of models. This procedure involves the mini- 
mization of information loss in path space, where information is inadvertently lost due 
to coarse-graining procedure. In fact, the proposed parametrization scheme in [10] 
is mathematically justified by reformulating it on the path space using the Relative 
Entropy Rate and it is a specific, but reversible (i.e. it has a Gibbs steady state) ex- 
ample of our methodology. Finally, the path-space Fisher Information Matrix (FIM) 
derived from Relative Entropy Rate (RER) can provide confidence intervals for the 
corresponding statistical estimators of the optimal parameter obtained through the 
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minimization problem. 

The paper is structured as follows. In Section [2] we formulate the path-space 
information theory tools used in the paper, including the key concept of Relative 
Entropy Rate. In Section[3]we present the parameterization method of coarse-grained 
dynamics and the connections with maximum likelihood estimators and the Fisher 
Information Matrix. In Section 2] we briefly discuss statistical estimators for RER 
and FIM. Finally, in Section [5] we demonstrate the proposed coarse-graining method 
in a non-equilibrium system with diffusing interacting particles, driven by out-of- 
equilibrium boundary conditions. 

2. Relative Entropy Rate, path space information theory and error 
quantification for non-equilibrium systems. First, we formulate a general en- 
tropy -based error analysis for coarse-graining, dimensional reduction and parametriza- 
tion of high-dimensional Markov processes, simulated by Kinetic Monte Carlo (KMC) 
and Langevin Dynamics. Typically such systems have Non-Equilibrium Steady States 
(NESS) for which detailed balance fails as they are irreversible. The stationary distri- 
butions are not known explicitly and has to be studied computationally. Quantifying 
and controlling the coarse-graining error in such systems thus requires to develop 
computable and efficient methods for estimating distances of probability measures on 
the path space. The relative entropy between two path measures -P[o,t] f^^id Qfo.t] 
(see (|2.3p for a specific example) for the processes on the interval [0,T] is 



7^ (P[0,T] I 0[0,T]) = EP[ 



[o,r] 



log 



dp. 



[0,T] 



dQ[0,T] 



(2.1) 



dP, 



where ^„' ' ' is the Radon-Nikodym derivative of P[o.t] with respect to Q[o,t]- If 
these probability measures have probability densities p, q respectively, (|2.1|) becomes 
T^ {P[o,T] I Q[o,T]) — /plog ( - ) • In the setting of coarse-graining or model-reduction 
the measure Piq^t] is associated with the exact process and Q[q,t] with the approxi- 
mating (coarse-grained) process. 

From an information theory perspective, the relative entropy measures the loss 
of information as we approximate the exact stochastic process -P[o,t] with the coarse- 
grained one Q[o,T]- In general the relative entropy (|2.ip in this dynamic setting is 
not a computable object; we refer for instance to related formulas in the Shannon- 
MacMillan-Breiman Theorem, [7]. However, as we show next, in practically relevant 
cases of stationary Markov processes we can work with the relative entropy rate 

n{P I Q) = hm ^n (P[o,T] I Q[o,T]) , (2.2) 

where P and Q denote the distributions of the corresponding stationary processes. 
Relative Entropy Rate for Markov Chains. In order to explain the basic concept we 
restrict to the case of two Markov chains, {Xn}n>o, {-'^n}n>o on the countable state 
space E, defined by the transition probability kernels p{x,x') and q{x,x'). A typi- 
cal example would be the embedded Markov chain used for KMC simulations of a 
continuous time Markov chain. In the case of a continuous state space a temporal 
discretization of a Langevin process, leads to a Markov process with the transition 
kernel p{x, dx') — pAtix, x') dx' defined by the time-discretization scheme of the un- 
derlying stochastic dynamics. We assume that the initial states are from the invariant 
distributions /i(x) and iy{x). The path measure defining the probability of a path 
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(xojXi, . . . ,xt) is then 

P{xo, ...,xt) ^ n{xo)p{xo,xi) . ..p{xt-i,xt) , (2.3) 

and similarly for the measure Q{xq, . . . ,xt)- The Radon-Nikodym derivative is easily 
computed 

dP _ fJ.{xo)ll^SQ^ p{x^,x.,+i) 



dQ i^{xa)l\i^()^ q{xi,Xi+i) 

Using the fact that the processes are stationary with invariant measures /i and v, we 
obtain an expression for the relative entropy 



niP\Q)^TE, 



p{x, X ) log 



+ 7^(Mk), (2.4) 



and thus the relative entropy rate is given explicitly as 



We will refer from now on to the quantity (|2.5I) as the Relative Entropy Rate (RER), 
which can be thought as the change in information per unit time. Notice that RER 
has the correct time scaling since it is actually independent of the interval [0,T]. 
Furthermore, it has the following key features that make it a crucial observable for 
simulating and coarse-graining complex dynamics: 

(i) The RER formula ()2.5p provides a computable observable that can be sampled 
from the steady state p in terms of conventional Kinetic Monte Carlo (KMC), 
bypassing the need for a histogram or an explicit formula for the high- 
dimensional probabilities involved in (j2.ll) . 
(ii) In stationary regimes, when T 3> 1 in (|2.4p . the term TZ{p\v) becomes unimpor- 
tant. This is especially convenient since p and v are typically not known ex- 
plicitly in non-reversible systems, for instance in reaction-diffusion or driven- 
diffusion KMC or non-reversible Langevin dynamics. 
In view of these features, we readily see that if we consider a Markov chain {X„}„>o as 
an approximation, e.g. a coarse-graining, of the chain {Xn\n>o, we can estimate the 
loss of information at long times by computing 'H{P\P) as an ergodic average. This 
observation is the starting point of the proposed methodology and relies on the fact 
that the observable T-L{P\Q) is computable; efficient statistical estimators for (|2.5I) are 
discussed in Section 31 A similar calculation can be carried out for continuous time 
Markov Chains, as we see next. 

Continuous Time Markov chains and Kinetic Monte Carlo. In models of catalytic 
reactions the systems are often described by continuous time Markov chains (CTMC) 
that are simulated by KMC algorithms. For example, the microscopic Markov process 
{ft}t>o describes the evolution of molecules on a substrate lattice. Mathematically the 
continuous time Markov chain is defined completely by specifying the local transition 
rates c^(cr, a') where 6* G M*^ is a vector of the model parameters. The transition rates 
determine the updates from any current state (configuration) at = <7 to a (random) 
new state a' . In the context of the spatial models considered here, the transition 
rates take the form c^{a,cr') — c^{x,uj,(j), denoting by a; e A^r a lattice site on a d- 
dimensional lattice An and uj e Snxi where Sn^ is the set of all possible configurations 
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that correspond to an update in a neighborhood of the site x. From local transition 
rates one defines the total rate A^(cr) = J2x<£A ^ujgs c^(x,a;,cr), which is the 
intensity of the exponential waiting time for a jump from the state a. The transition 
probabilities for the embedded Markov chain {Sn}n>o are p{a,cr';6) — x'ca-'e) ■ ^^ 
other words once the exponential "clock" signals a jump, the system transitions from 
the state cr to a new configuration a' with the probability p(cr, ct'). In the context of 
coarse-graining or hybrid systems we are led to finding an optimal parametrization 
for the rates c{a,a';9) of a processes that approximates the dynamics given by the 
microscopic process c{a,a'). A similar calculation as in the case of Markov chains 
gives the analogue of the formula (12.51) 



H(P|Q)=E, 



A(fT)-A(a;0)-Vc(a,a')log^ 






(2.6) 



where /i is the stationary distribution of the microscopic process and A denotes total 
transition rates. In |13| we used this quantity in order to quantify error in a two- 
level coarse-grained kinetic Monte Carlo method. Based on these considerations, we 
show in Section [3] that minimizing the error measured by ()2.6p leads to a Markovian 
coarse-grained dynamics that best approximates long-time behavior of the microscopic 
process projected to the coarse degrees of freedom. 

Remark 2.1. We consider the special case where the transition probability 
function of the Markov chain is sampled directly from the invariant measure, i.e. 

p{(t,<t') — n{a'), and q{a,(j') — i^(cr'), for all cr, cr' G S. 

This sampling is equivalent to the fact that the path space samples in (|2.4p are 
independent and identically distributed from the stationary probability distributions. 
Then the RER between the path probabilities becomes the usual relative entropy 
between the stationary distributions: 

HiP\Q)=ni^i\u) . (2.7) 

Estimating RER using (|2.5|) is far simpler than directly estimating the relative entropy 
TZ{fi\i'), since (|2.5p only involves local dynamics rather than the full steady state 
measure, which typically may not be available. Furthermore even when it is available 
in the form of a Gibbs state it will require computations that will typically involve a 
full Hamiltonian, [6]. 

Estimation and error of observables. The estimates on relative entropy and RER 
can provide an upper bound for a large family of observable functions through the 
Pinsker's (or Csiszar-KuUback-Pinsker) inequality. The Pinsker inequality states that 
the total variation norm between -P[o,t] and Qiq^t] is bounded in terms of the relative 
entropy, [7]. The Pinsker inequality gives an estimate for a difference of the mean 
computed with respect to the distribution P and Q 



|Epj„,,,[/]-Eq,„,J/]| < \\f\\^,/m{P^^^^^^\Q^ . (2.8) 

An important conclusion that is immediately drawn from the above inequality is that 
if the relative entropy of a distribution with respect to another distribution is small 
then the error between any bounded observables function is also accordingly small. 
Using (|2.4p we readily obtain the estimate 



l^Pio.n [/] - ^Qio,T, [/]l < ll/llooV2T^/H(P I Q) + l7^ (m k) , (2.9) 
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involving the relative entropy rate (j2.5p or (j2.6p . As in virtually all numerical analysis 
estimates for stochastic dynamical systems, the bound (|2.9I) may not be sharp, but it 
is indicative of the error in the observables when the distribution Q approximates P. 

3. Parametrization of coarse-grained dynamics and Inverse Dynamic 
Monte Carlo. We consider a parameterized class of coarse-grained Markov processes 
{^t}t>o, associated with the fine scale stochastic process {crt}t>o- The coarse-graining 
procedure is based on projecting the microscopic space E into a coarse space S with 
less degrees of freedom. We denote the coarse space variables 

r; = Tcr , where T : S ^ E (3.1) 

is a coarse-graining (projection) operator, see also (|5.2p for a specific example. In the 
case of continuous time processes such as Kinetic Monte Carlo, the coarse-grained 
stochastic process is defined in terms of coarse transition rates 0(77, 77') which captures 
macroscopic information from the fine scale rates c{a, a'). For example, for stochastic 
lattice systems, approximate coarse rate functions are explicitly known from coarse 
graining (CG) techniques of [HI [19], see (|5.1I) . Similarly, when we consider tempo- 
rally discretized stochastic processes such as Langevin Dynamics, the coarse-grained 
process is given in terms of transition probabilities p{ri, rj') which capture macroscopic 
information from the fine scale transition probabilities p{a, a'). 

Interpolated dynamics. Given coarse-grained dynamics we can always construct cor- 
responding microscopic dynamics. For example, given coarse-grained transition prob- 
abilities piji^ rj') with corresponding stationary distribution /i, where the latter is typ- 
ically unknown in non-equilibrium systems, we define the corresponding fine-scale 
rates 

q{a,a')~p{Ta,Ta'). (3.2) 

Here we apply piece-wise constant interpolation for all microscopic states a (reps, cr') 
corresponding to the same coarse state r? (reps, ry') and thus transitions to these states 
occur with the same probability rates. The reconstruction step p.2p is necessary when 
we want to compare fine and coarse processes on the path space in terms of the relative 
entropy rate, since both processes need to be defined on the same probability space, 
see for example p.Sp below. In this paper, for the sake of simplicity, we assume that 
all reconstructions are based on ()3.2p . The reconstruction is obviously not unique and 
a complete discussion and related strategies are given in [50] . 

3.1. Inverse Dynamic Monte Carlo methods.. In many applications the 
coarse-grained models are defined by effective potentials or effective rates which are 
sought in a family of parameter-dependent functions, |31[ [23J [2T]. The parameters 
are then fitted by minimizing certain functionals that attempt to capture different 
aspects of modeling errors, e.g., radial distribution functions in [24j . Compared to such 
Inverse Monte Carlo methods applied to equilibrium systems we cannot work with 
equilibrium distributions since the primary information about the non-equilibrium 
system is represented by the transition rates and the NESS is not known. Thus we 
apply the information-theoretic framework on the path space, i.e., the approximating 
measure Q[o,t] = Qfo t] depends on the parameters 6* e M*^ that arc fitted using 
entropy based criteria for the best approximation. 

The optimal parametrized coarse-grained transition probabilities q^ (cr, a') are 
constructed as follows. First, given the parametrized coarse-grained transition proba- 
bilities p^ (77,77') we define the fine-scale projected rates g^(cr, cr'), which can be defined, 
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for instance, by p.2p as 

g^(a,a')=/(Ta,Ta'), 

and the corresponding coarse-grained path-distribution is 

g^(ao, . . . , ar) = Ai(Tao)g^ao, ai) . . . q^iaT-u^rr) ■ (3.3) 

Subsequently the best-fit can be obtained by minimizing the relative entropy rate, 
i.e., finding solution 

r =argmin?^(P|Q^), (3.4) 



where now we have that the RER is 

This optimization problem on one hand is similar to more common parametric infer- 
ence in which the log-likelihood function is maximized, and this perspective will be 
further clarified in Section 13.21 Furthermore, due to the parametric identification of 
the coarse-grained dynamics, i.e., transition probabilities, or rates in the case of (|2.6p . 
we refer to the proposed methodology as an Inverse Dynamic Monte Carlo method 
in analogy to the Inverse Monte Carlo methods for equilibrium systems, [3T1 [24l [2T] . 
The optimization algorithm for (|3.4I) is based on iterative procedures that locate 
a solution 9* of the optimality condition WgH{P \ Q^) = of the following type 

g,(„+l) ^ ^(«) __ "(^(n+l) ^ ^3_g) 

n 

for some a > and G^"^^^ being a suitable approximation of the gradient VeH(P | Q^), 
more precisely E[G("+i) \G(°\e^°\ . . . ,G^"-\e^"'>] ^Ve'H{P\Q'^). The crucial ingre- 
dient of this algorithm is an efficient and reliable estimator for the sequence G^"-* of 
the gradient estimates. Similar to the deterministic case the minimization can be 
accelerated by combining this step with the Newton-Raphson method and choosing 
the vector G as 

G" = Hess(H(P I g^"))-iV(,H(P I Q"") . (3.7) 

While the evaluation of the Hessian Hess('H(P \Q^ )) presents an additional computa- 
tional cost it also offers additional information about the parametrization, sensitivity 
and identifiability of the approximating model, [26] . Indeed the first and the second 
derivatives of the rate function H{P\Q^ ) are of the form 



Vein{P\Q'))^-E^ 
and 



^p(a,a)Velog/(a,a') 



(3.8) 



FniQ") = -E^ [j2a'pia,a)Vl\ogq'>ia,a')da'] . (3.9) 

The Hessian can be interpreted as a dynamic analogue of the Fisher Information Ma- 
trix (FIM) F-H (Q^) on the path space. A similar quantity, in the context of sensitivity 
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analysis, was recently considered in 1261 . where the authors also developed efficient 
statistical estimators for the derivatives of RER de^'H{P \ Q^) and dg.g/HiP \ Q^). We 
discuss related estimators in Sectional 

Remark 3.1. The proposed approach carries sufficient level of generality in or- 
der to be applicable to a wide class of stochastic processes, e.g., Langevin dynamics 
and KMC, without restriction to the dimension of the system, provided scalable or 
other efficient simulators are available to simulate the observables, (|2.5|) and (|2.6p . 
The proposed parametrized coarse-graining is applicable to any system for which a 
parametrized coarse-grained models are available, e.g., in coarse-graining of macro- 
molecules and biomembranes, [25l |31] [24]. An obvious obstacle is that the path 
measure -P[o.t] is absolutely continuous with respect to Q[q^t]-, however, it does not 
significantly restrict the class of relevant applications as we typically deal with KMC of 
Markov Chain approximations resulting from a discretization of Molecular Dynamics 
with noise. In the latter case, Markov chains obtained by numerical approximations 
of stochastic differential equations (SDEs) allow us to compute RER through (|2.5p 
and can be used for quantification of errors or inverse Monte Carlo fitting for non- 
equilibrium or irreversible models in Section [3l For example, the diffusion process 
given by the stochastic differential equations with a d-dimensional Wiener process 
W{t\ 

dX{t)^b{X{t)) + ^/ldW{t), 

can be discretized by the Euler scheme with the time-step h 

where C" ^ -^(0,1). In turn, the scheme is equivalent to a Markov chain with the 
transition kernel 

p{x, x')dx ~ e-3kb'-^+'iK=^)l' dx' . 

The parametrization scheme in [TU] in the context of Fokker-Planck equations is such 
an example for SDEs; this scheme is mathematically justified by using the Relative 
Entropy Rate (|2.5p and it is a specific, but reversible (i.e. it has a Gibbs steady state) 
example of our methodology. 

3.2. Path-space likelihood methods and data-based parametrization of 
coarse-grained dynamics. A different, and asymptotically equivalent perspective 
on parametrizing coarse-grained dynamics relies on viewing the microscopic simulator 
as means of producing statistical data in the form of a time-series. Although the 
proposed method can be applied to systems simulated by Langevin-type dynamics 
we demonstrate its application in the Kinetic Monte Carlo algorithms in Section [5l 
The novelty of the presented work lies in deriving the parametrization by optimizing 
the information content in the path space compared to the available data, taking 
advantage of computable formulas for relative entropy discussed in Section [2J and by 
(b) using systematically derived, e.g., via cluster expansions, 17;, classes of parametric 
models giving rise to both statistically identifiable and accurate coarse-grained models. 
For the latter point we also refer to the discussion of the example in Section [5] 

We consider a fine-scale data set of configurations T) — {ai,<72, •■•,o'7v} obtained 
from the KMC algorithm. As is typical in the KMC framework, we assume that 
the atomistic model, and the corresponding data set, can be described by a spatial, 
continuous-time Markov jump process, ^1]. The path-space measure of this KMC 
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process, see for the Markov Chain analogue of the path measure (|2.3p . is parametrized 
a,s P = P^. In this sense we assume that for the particular data set V the "true" 
parameter value is 9 = 9* . Identifying 9* amounts, mathematically, to minimizing 
the pseudo-distance given by the relative entropy, inmgTZ{P^ \Q^)- Furthermore, 
following (|2.4p it suffices to minimize ^{P^ \ Q^). On the other hand, using the 
ergodicity of the fine scale process associated with the data set V = {cri, (72, ..., ctn}, 
we have the estimators 

niP'' I Q') = lim -fiNiP'' I Q') (3.10) 

where we define the unbiased estimator for RER, see Section |4l 

^.(P^*|Q^):=^El°g^#^' (3-11) 

and q^{(T,a') is defined in p.3p . For simplicity in notation we show the estimator 
p.lOp for Markov Chain case, where p^{(J, a') denotes the transition probability. The 
continuous-time case, which is relevant to the coarse-grained KMC simulations in 
Section El is similar using ()2.6|) . 

Therefore, the minimization of relative entropy rate becomes 

N ^ N 

mmHA,(P''*|g'')=max-^logg'^(a„a,+i)--^log/*(a„a,+i), (3.12) 

i=l 1=1 

which does not require a priori the knowledge of 9* . Wc define the path space Likeli- 
hood as 



1 ^ 
HO;{a^to) := -^log<z''(cT.,a.+i). (3.13) 



i=l 



Note that if the transition probabilities in p.l3p are replaced with a stationary mea- 
sure and N corresponding independent samples T) = {ci, 0-2, ...jCrAf}, then p.l3p 
becomes the classical Maximum Likelihood Principle (MLE). In this sense (|3.13p is a 
Maximum Likelihood for the stationary time series V = {cti, 172, ..., ctat} of the micro- 
scopic process, and thus include dynamics information. Furthermore, it allows us to 
obtain the Markovian best-fit from the dynamical simulation and observations on a 
single, long-time realization of the process. 

Fisher Information Matrix and Confidence Intervals. The computable Fisher Infor- 
mation Matrix (FIM) in (|3.9p can provide confidence intervals for the corresponding 
estimator 9]\[ ~ 9*, based on the asymptotic normality of the MLE estimator 9^. 
Indeed, under additional mild hypotheses on the samples T> = {cri, 0-2, .-., ctat}, [5], 
this general procedure guarantees convergence in the usual central limit sense 

9n -^ 9* a.s. and N^/'^{9n - 9*)^N{Q,F'h^\Q^')), (3.14) 

where the variance is determined by the Fisher Information Matrix F-h (Q^ ) , or 

asymptotically by F^((3^"). Thus estimating the FIM F^((5^") using ([^ provides 
error bars on computed optimal parameter values 9*. 
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4. Statistical estimators for RER and FIM. The Relative Entropy Rate 
(|3.5|) . as well as the Fisher Information Matrix (I3.9P are observables of the stochastic 
process and can be estimated as ergodic averages. Thus, both observables are com- 
putationally tractable since they depend only on the local transition quantities. We 
give explicit formulas for the case of the continuous-time Markov chain and also refer 
to [26]. 

The first estimator for RER is given by 

?^["'(i^lQ')-;=;E^^4E'^(^-^')xlog^pi^-(A(aO-A^a,))l, (4.1) 

i=0 a'eE ^ ' ^ 

where An is an exponential random variable with parameter X{ai) while T = ^^ Atj 
is the total simulation time. The sequence {o'i}"^Q is the embedded Markov chain 
with transition probabilities p{ai,a') — ^^(^\ at the step i and c {(Ti,a') are the 

rates of the parametrized process, e.g., the coarse-grained rates c^{Tai,cr'). Notice 
that the weight Atj which is the waiting time at the state ai at each step is necessary 
for the correct estimation of the observable, [llj. Similarly, the estimator for the FIM 

is 

_. n — 1 

f(") = ^E^^^ E c\a,,a')S/elogcO{a.„a')Vg\ogc'{a.,a'f . (4.2) 

i=0 cr'eE 

The computation of the local transition rates c{ai, a') for all a' G E is needed for the 
simulation of the jump Markov process when Monte Carlo methods such as stochastic 
simulation algorithm (SSA), [TT] is utilized. Thus, the estimators H^ and F-^" 
present only a minor additional computational cost in the simulation. 

The second numerical estimator for RER is based on the Girsanov representation 
of the Radon-Nikodym derivative and it is given by 



H 



n-l 



2 



(P I Q')-'-T. log :fe^ - ^ E An(A(..) - A^(aO) . (4.3) 



n— -'c'^(a.,a.+i) T^' 



Similarly we can construct an FIM estimator. The term in (j4.3p involving logarithms 
should not be weighted since the counting measure is approximated with this esti- 
mator. Unfortunately, the estimator (|4.3p has the same computational cost as (|4.ip 
due to the need for the computation of the total rate which is the sum of the local 
transition rates. Furthermore, in terms of the variance, the latter estimator has worse 
performance due to the discarded sum over the states a' . For more details we also 
refer to ^5] . 

5. Benchmark: coarse-grained driven Arrhenius diffusion of interact- 
ing particles.. We demonstrate the proposed methodology on an example of a 
driven, non-equilibrium diffusion process of interacting particles formulated as a lattice 
gas model with spin variables (t{x) G {0, 1} at lattice sites x e Ajy. 

This is a prototype driven system introduced as a model system for the influence of 
microscopic dynamics to macroscopic behavior in separations problems in J32' . This 
model problem is intimately related to works on the structure of non-equilibrium 
steady states (NESS), [HI [27] i ^^ ^'^^^ ^^ to the general formalism of non-equilibrium 
statistical mechanics, [H [T^]. The evolution of particles is described in the context 
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of the lattice-gas model as an exchange dynamics with the Arrhenius migration rate 
from the site x G Ayy to the nearest-neighbor sites jy — a;| = 1 

c{x,y,a) = de-^(^(^^'"»[cr(a;)(l - a{x + 1)) + a(.T)(l - a(x - 1))] , 

which describes the diffusion of a particle at x moving to y and interacting through a 
two-body potential J(x — y) and with an external field h defining an energy barrier 
U{x,a) = "Yliz=/^x "^(■^ ~ ^)'^(-^) ~ ^- The continuous-time Markov chain is defined by 
its rates and updates to new configurations ct^'^ in which the spin variables (j(x) and 
(t(2/) exchanged its values. 

Under the assumption of a local equilibrium a straightforward local averaging 
yields the coarse-grained rates, ^^, 

c(k,l,f^)^\{k){q-rj{l))de-P^^'^^^\ (5.1) 

for the lattice-gas model with local concentrations ri{k) defined as the number of 
particles in a coarse cell of the size q; in fact, according to p.l[) we define the coarse 
graining operator 

Vik) = Ta(fc) = Y. ^(^) ' (5.2) 

xeCk 

where r]{k) e {0, . . . , q}. Keeping the two-body interactions as a basis for the coarse- 
grained approximation the effective potentials between block spins k and I are obtained 
by a straightforward spatial averaging 



x(£Ck yeCi 

j-(fc,fc) = -^;^ ^j(.-y). 



x£Ck y£Ck 

Assuming that the approximating dynamics is of Arrhenius type we obtain the energy 
barriers 

C7(fc, v)=Y. J^^^ 0'7(fc) + >/(0, 0)(77(fc) ^l)-h. 
I 

The resulting dynamics is a Markovian approximation of the coarse-grained evolution 
and it is defined as CTMC with the rates c(fc, I, rf). 

As a prototype example of the interactions we consider the constant potential 
J{x) — Jo for \x\ < L and J{x) — otherwise. The effective potential is parametrized 
by a single parameter = Jq corresponding to the strength of the coarse-grained 
interactions. The system is driven by the concentration gradient given by different 
concentrations at the boundary sites a; = and x = N. In the long time behav- 
ior the distribution converges to a stationary distribution that gives rise to a NESS 
concentration profile across the computational domain. 

The local mean-field approximation which defines the interaction potential J{k—l) 
between two block spins rj{k) and 7]{l) by averaging contributions from all spin-spin 
interactions in the cells does not provide a good approximation as demonstrated in 



Figure 5.1(a) where the inset depicts the error estimated in terms of the entropy rate 



'H{P\P). However, the mean-field potential J is a good initial datum for 
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In this benchmark we stay in the family of two-body potentials and chose to 
fit only a single parameter that defines the total strength of the interaction. Thus 
the rates are parametrized by the effective potential J{-',0) using a single parameter 
only. The best-fit was obtained by solving the minimization problem (13. 4[) . hence, 



minimizing the error defined by 'H(P | P ). Figure 5.1(b) depicts concentration pro- 
files for different sizes q of the coarse cells. The dashed lines represent results from 
simulations with mean- field interactions between cells only (i.e., the initial guess in 
the optimization), while the solid lines represent simulations with the parametrized 
effective interactions. 

Comparison with the profile obtained from the microscopic simulation (the solid 
black line) clearly indicates that when the coarse-graining size q becomes close to the 
interaction range L of the microscopic potential J the best-fit in a one-parameter 
family is not sufficient for obtaining good approximation and a better candidate class 
of models, in this case coarse-grained (CG) dynamics c{k, I, rj), needs to be found for 
improved parametrization. Indeed, in [1] |17j we showed that coarse grained, multi- 
body cluster Hamiltonians provide such a parametrization. More specifically, in |17) 
we demonstrated, through rigorous cluster expansions that (typical in the state-of- 
the-art) two-body CG approximations break down in lower temperatures and/or for 
short range particle-particle interactions, and additional multi-body CG terms need 
to be included in the models in order the CG model to capture accurately phase 
transitions and other physical important properties. Hence, no parametrization can 
consistently address this issue, unless the the proper class of parametric models is 
identified first. 
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(a) Stationary concentration profiles for dif- (b) Stationary concentration profiles with fit- 
ferent cell sizes q with mean-field interactions, ted J{k; 6*). 
The inset depicts errors at difi'erent q esti- 
mated by "H 

Fig. 5.1. Coarse-grained simulations of driven diffusion of interacting particles without (a) and 
with (b) fitted effective interactions. 
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(a) Convergence of estimates S„ to the (b) Dependence of "H on the parameter 6. 
optimal value 8* . 

Fig. 5.2. Minimization ofH. The figure (a) depicts convergence of the estimators for 9n and 
the gradient (derivative) \7g'H to the optimal value 8* and the optimality condition \7g'H{8*) = 
0. The right plots depict convergence of confidence intervals for the estimators. The figure (b) 
demonstrates the convexity ofH with respect to 9 which holds due to the particular choice 8 = /3Jo 
in the benchmark. 
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